AITopics | partitioned data

Collaborating Authors

partitioned data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Communication Complexity of Distributed Convex Learning and Optimization

Yossi Arjevani, Ohad Shamir

Neural Information Processing SystemsOct-2-2025, 09:39:08 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Fusion Sampling Validation in Data Partitioning for Machine Learning

Udomboso, Christopher Godwin, Sigauke, Caston, Adinya, Ini

arXiv.org Artificial IntelligenceAug-5-2025

Effective data partitioning is known to be crucial in machine learning. Traditional cross-validation methods like K-Fold Cross-Validation (KFCV) enhance model robustness but often compromise generalisation assessment due to high computational demands and extensive data shuffling. To address these issues, the integration of the Simple Random Sampling (SRS), which, despite providing representative samples, can result in non-representative sets with imbalanced data. The study introduces a hybrid model, Fusion Sampling Validation (FSV), combining SRS and KFCV to optimise data partitioning. FSV aims to minimise biases and merge the simplicity of SRS with the accuracy of KFCV. The study used three datasets of 10,000, 50,000, and 100,000 samples, generated with a normal distribution (mean 0, variance 1) and initialised with seed 42. KFCV was performed with five folds and ten repetitions, incorporating a scaling factor to ensure robust performance estimation and generalisation capability. FSV integrated a weighted factor to enhance performance and generalisation further. Evaluations focused on mean estimates (ME), variance estimates (VE), mean squared error (MSE), bias, the rate of convergence for mean estimates (ROC\_ME), and the rate of convergence for variance estimates (ROC\_VE). Results indicated that FSV consistently outperformed SRS and KFCV, with ME values of 0.000863, VE of 0.949644, MSE of 0.952127, bias of 0.016288, ROC\_ME of 0.005199, and ROC\_VE of 0.007137. FSV demonstrated superior accuracy and reliability in data partitioning, particularly in resource-constrained environments and extensive datasets, providing practical solutions for effective machine learning implementations.

artificial intelligence, machine learning, variance, (15 more...)

arXiv.org Artificial Intelligence

2508.01325

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Communication Complexity of Distributed Convex Learning and Optimization

Neural Information Processing SystemsMar-13-2024, 00:45:33 GMT

We study the fundamental limits to communication-efficient distributed methods for convex learning and optimization, under different assumptions on the information available to individual machines, and the types of functions considered. We identify cases where existing algorithms are already worst-case optimal, as well as cases where room for further improvement is still possible. Among other things, our results indicate that without similarity between the local objective functions (due to statistical data similarity or otherwise) many communication rounds may be required, even if the machines have unbounded computational power.

algorithm, communication round, local function, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Exploring Machine Learning Models for Federated Learning: A Review of Approaches, Performance, and Limitations

Jafarigol, Elaheh, Trafalis, Theodore, Razzaghi, Talayeh, Zamankhani, Mona

arXiv.org Artificial IntelligenceNov-17-2023

In the growing world of artificial intelligence, federated learning is a distributed learning framework enhanced to preserve the privacy of individuals' data. Federated learning lays the groundwork for collaborative research in areas where the data is sensitive. Federated learning has several implications for real-world problems. In times of crisis, when real-time decision-making is critical, federated learning allows multiple entities to work collectively without sharing sensitive data. This distributed approach enables us to leverage information from multiple sources and gain more diverse insights. This paper is a systematic review of the literature on privacy-preserving machine learning in the last few years based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Specifically, we have presented an extensive review of supervised/unsupervised machine learning algorithms, ensemble methods, meta-heuristic approaches, blockchain technology, and reinforcement learning used in the framework of federated learning, in addition to an overview of federated learning applications. This paper reviews the literature on the components of federated learning and its applications in the last few years. The main purpose of this work is to provide researchers and practitioners with a comprehensive overview of federated learning from the machine learning point of view. A discussion of some open problems and future research directions in federated learning is also provided.

application, federated learning, learning, (12 more...)

arXiv.org Artificial Intelligence

2311.10832

Country:

North America > United States > Oklahoma (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.93)

Industry:

Law Enforcement & Public Safety (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Technology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(5 more...)

Add feedback

Vertical Federated Learning: A Structured Literature Review

Khan, Afsana, Thij, Marijn ten, Wilbik, Anna

arXiv.org Artificial IntelligenceApr-9-2023

Federated Learning (FL) has emerged as a promising distributed learning paradigm with an added advantage of data privacy. With the growing interest in having collaboration among data owners, FL has gained significant attention of organizations. The idea of FL is to enable collaborating participants train machine learning (ML) models on decentralized data without breaching privacy. In simpler words, federated learning is the approach of ``bringing the model to the data, instead of bringing the data to the mode''. Federated learning, when applied to data which is partitioned vertically across participants, is able to build a complete ML model by combining local models trained only using the data with distinct features at the local sites. This architecture of FL is referred to as vertical federated learning (VFL), which differs from the conventional FL on horizontally partitioned data. As VFL is different from conventional FL, it comes with its own issues and challenges. In this paper, we present a structured literature review discussing the state-of-the-art approaches in VFL. Additionally, the literature review highlights the existing solutions to challenges in VFL and provides potential research directions in this domain.

artificial intelligence, federated learning, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2212.00622

Country:

Europe > Netherlands > Limburg > Maastricht (0.04)
Asia > China (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.69)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Compressed-VFL: Communication-Efficient Learning with Vertically Partitioned Data

Castiglia, Timothy, Das, Anirban, Wang, Shiqiang, Patterson, Stacy

arXiv.org Artificial IntelligenceMar-28-2023

We propose Compressed Vertical Federated Learning (C-VFL) for communication-efficient training on vertically partitioned data. In C-VFL, a server and multiple parties collaboratively train a model on their respective features utilizing several local iterations and sharing compressed intermediate results periodically. Our work provides the first theoretical analysis of the effect message compression has on distributed training over vertically partitioned data. We prove convergence of non-convex objectives at a rate of $O(\frac{1}{\sqrt{T}})$ when the compression error is bounded over the course of training. We provide specific requirements for convergence with common compression techniques, such as quantization and top-$k$ sparsification. Finally, we experimentally show compression can reduce communication by over $90\%$ without a significant decrease in accuracy over VFL without compression.

artificial intelligence, compression, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2206.0833

Country:

North America > United States > New York > Rensselaer County > Troy (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (1.00)
Information Technology (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

VertiBayes: Learning Bayesian network parameters from vertically partitioned data with missing values

van Daalen, Florian, Ippel, Lianne, Dekker, Andre, Bermejo, Inigo

arXiv.org Artificial IntelligenceOct-31-2022

Federated learning makes it possible to train a machine learning model on decentralized data. Bayesian networks are probabilistic graphical models that have been widely used in artificial intelligence applications. Their popularity stems from the fact they can be built by combining existing expert knowledge with data and are highly interpretable, which makes them useful for decision support, e.g. in healthcare. While some research has been published on the federated learning of Bayesian networks, publications on Bayesian networks in a vertically partitioned or heterogeneous data setting (where different variables are located in different datasets) are limited, and suffer from important omissions, such as the handling of missing data. In this article, we propose a novel method called VertiBayes to train Bayesian networks (structure and parameters) on vertically partitioned data, which can handle missing values as well as an arbitrary number of parties. For structure learning we adapted the widely used K2 algorithm with a privacy-preserving scalar product protocol. For parameter learning, we use a two-step approach: first, we learn an intermediate model using maximum likelihood by treating missing values as a special value and then we train a model on synthetic data generated by the intermediate model using the EM algorithm. The privacy guarantees of our approach are equivalent to the ones provided by the privacy preserving scalar product protocol used. We experimentally show our approach produces models comparable to those learnt using traditional algorithms and we estimate the increase in complexity in terms of samples, network size, and complexity. Finally, we propose two alternative approaches to estimate the performance of the model using vertically partitioned data and we show in experiments that they lead to reasonably accurate estimates.

artificial intelligence, bayesian network, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.17228

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Netherlands > Limburg > Maastricht (0.05)
Oceania > Australia (0.04)
(7 more...)

Genre: Research Report > Promising Solution (0.66)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (0.69)
Health & Medicine > Health Care Providers & Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Day 32: 60 days of Data Science and Machine Learning Series

#artificialintelligenceJun-6-2022, 16:45:50 GMT

Do you recommend Recommendation Engines? Do you recommend Recommendation Engines? Identity Resolution is More Valuable When it's on Your Warehouse Identity Resolution is More Valuable When it's on Your Warehouse

data science, recommendation engine, science and machine learning series, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

An Efficient Learning Framework For Federated XGBoost Using Secret Sharing And Distributed Optimization

Xie, Lunchen, Liu, Jiaqi, Lu, Songtao, Chang, Tsung-hui, Shi, Qingjiang

arXiv.org Artificial IntelligenceMay-12-2021

XGBoost is one of the most widely used machine learning models in the industry due to its superior learning accuracy and efficiency. Targeting at data isolation issues in the big data problems, it is crucial to deploy a secure and efficient federated XGBoost (FedXGB) model. Existing FedXGB models either have data leakage issues or are only applicable to the two-party setting with heavy communication and computation overheads. In this paper, a lossless multi-party federated XGB learning framework is proposed with a security guarantee, which reshapes the XGBoost's split criterion calculation process under a secret sharing setting and solves the leaf weight calculation problem by leveraging distributed optimization. Remarkably, a thorough analysis of model security is provided as well, and multiple numerical results showcase the superiority of the proposed FedXGB compared with the state-of-the-art models on benchmark datasets.

opération, participant, xgboost, (14 more...)

arXiv.org Artificial Intelligence

2105.05717

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts (0.04)
Europe > Czechia (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Privacy-preserving Data Sharing on Vertically Partitioned Data

Tajeddine, Razane, Jälkö, Joonas, Kaski, Samuel, Honkela, Antti

arXiv.org Machine LearningOct-19-2020

In this work, we present a method for differentially private data sharing by training a mixture model on vertically partitioned data, where each party holds different features for the same set of individuals. We use secure multi-party computation (MPC) to combine the contribution of the data from the parties to train the model. We apply the differentially private variational inference (DPVI) for learning the model. Assuming the mixture components contain no dependencies across different parties, the objective function can be factorized into a sum of products of individual components of each party. Therefore, each party can calculate its shares on its own without the use of MPC. Then MPC is only needed to get the product between the different shares and add the noise. Applying the method to demographic data from the US Census, we obtain comparable accuracy to the non-partitioned case with approximately 20-fold increase in computing time.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2010.09293

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback